Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems
نویسندگان
چکیده
Classic database indexes (e.g., B-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when deployed on modern storage devices. (2) Maintaining an index incurs high latency because the DBMS has to locate and update those index pages affected by the underlying table changes. This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B-Tree while still achieving comparable query execution performance to that of the B-Tree for 0.1% 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in executing queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.
منابع مشابه
Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL - Lessons from the Road
In this paper, we present our experience in indexing the dropoff and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-ofthe-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new...
متن کاملSlalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing
The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of ...
متن کاملA Superimposed Codeword Indexing Scheme for Handling Sets in Prolog Databases
While there has been growing interest in the use of Prolog for database applications, the size of these applications is limited by the capabilities of current Prolog systems for handling disk resident clauses. A major impediment is the inordinate amount of time required for retrieval and urujication of clauses from a large set stored on disk. Indexing is commonly used in conventional database s...
متن کاملLightweight Indexing for Log-Structured Key-Value Stores
The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of log-structured key-value stores, represented by Google’s BigTable [1], Apache HBase [2] and Cassandra [3]. While providing key-based data access with a Put/Get interface, these key-value stores do not support valuebased access methods, which...
متن کاملUse of Transforms for Indexing in Audio Databases
The phenomenal increases in the amounts of audio data being generated, processed, and used in several computer applications have necessitated the development of audio database systems with newer features such as content-based queries and similarity searches to manage and use such data. Fast and accurate retrievals for content-based queries are crucial for such systems to be useful. EEcient cont...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 10 شماره
صفحات -
تاریخ انتشار 2016